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[54] Method for Eliminating Pipeline Control Related Delay 

[57] Abstract: 

The invention discloses a method for eliminating the pipeline control related delay, for 
efficiently eliminating the pipeline control related delay so as to increase the performance of 
microprocessor with simpler hardware and lower power consumption. The technical solution 
is as follows. A compiler determines all possible transferring target addresses for a branch 
type instruction and inserts a prefetch Instruction. All subsequent instructions of the present 
instruction are read by two instruction fetch components in advance, but only the instructions 
provided by one of the two instruction fetch components are selected by a selector based on 
the decoding results of the present branch type instruction for decoding and executing. There 
are three prefetch instructions, namely, fetch addrl & addr2, fetch addr, fetch stack, 
corresponding to various branch type instructions and executed by the instruction fetch 
component. The instruction prefetch is performed in basic blocks. The invention has 
advantages of lower power consumption and complexity of hardware, high ratio for eliminating 
control related delay, high performance and cost ratio and reduced useless prefetched 
instructions, 
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Method for Eliminating Pipeline Control Related Delay 
Field of the Invention 

The present invention generally relates to a method for eliminating the oinelin, 
control related delay in the microprocessor design, particular yTn 2 dtign of "n 
embedded nncroprocessor with low power consumption and complex^ of S^L 

Description of the Related Art 

At present, pipeline control related delay in the microprocessor design is eliminated 
mamJy m two ways, i.e., branch prediction and delayed branch The y 

ST pr rr r ning is utiiized in *• branch «^*» 

whether the next branch transfer will succeed or not based on the statistic information 

ITcZ" TT SUltS ° f ±C htanCh ^ inStniCtion - The **« of the branch 
predion depends upon its accuracy, and furthermore, is closely related to the 
overhead caused by the branch prediction. The pipeline control related delay 
depends upon the configuration of the pipeline, the prediction method and he 
restoranon pohcy adopted after a prediction error occurs. This mefco has a 

tlZtiT m *" " dC r 65 8reat SUPP ° rt ' SUch " Prediction^ nd 

restoranon components after a prediction error occurs, resulting in high overhead and 

power consumption The basic principle of the delayed branch meLd lies in that 

" Te,a r d t0 the Wh ^ are executed during the contro 

related pause cycle so as to cover the cycle. This method also has a disadvantage 
that it ,s ^possible to fill all the branch delay slots and to ensure that all instructions 

ar cf.l"H CXe ; Ut 1- * " ^ th3t « not «««riJy -ecuted 

are called, improved performance is not provided in a real sense. 

znnltt mic ^ pr ° K Cessors ' « mainly deployed in areas such as domestic 

app ances, cell phones, microcontrollers, and the like, demand low power 

*Z T,?"' n ° nC ° f ab ° Ve meth ° dS iS a satisf ^y solution as they cannot 
substantially .mprove .ts performance either because they fail to meet the requirement 
for low power consumption and complexity of hardware or because they are not 

ImZl "T 6 7 d K lminalin e lhe contr °I elated delay. These methods cannot 
efficiently ehmmate the control related delay in the pipeline even in the general 
purpose microprocessor design. 

Summary of the Invention 

The technical problem the present invention is to solve is to efficiently eliminate the 
pmehne control related delay in the embedded microprocessor with low complexity of 
hardware and power consumption, to thereby improve the performance of the 
microprocessor 

In VoTST SCh r me 18 aS f ° ll0WS - A C ° mpiler in * e COra * in *£ module determines 
all pos S1 blc transferring target addresses for a branch type instruction and inserts a 
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prefetch instruction. Two instruction fetch components are disposed in the 
instruction fetch module to read all subsequent instructions of the present instruction 
in advance. A selector is disposed in the instruction decoding and execution module 
lo select instructions provided by one of the two instruction fetch components based 
on the decoding results of the present branch type instruction for decoding and 
execution by the instruction decoding and execution module, to thereby eliminate the 
pipeline control related delay. It has not been reported that the pipeline control 
related delay is eliminated with this method at home or abroad. 

In the invention there are eight terms involved, i.e., pipeline, branch type instruction, 
pipeline related, control related, control related delay, instruction prefetch, prefetch 
instruction, basic block, which arc respectively defined as: 

1) Pipeline, i.e., the specific realization of the instruction execution pipelining 
technique which divides the instruction execution procedure of a microprocessor into 
a number of sub-procedures with each sub-procedure able to be effectively executed 
on its special functional section simultaneously with other sub-procedures. The 
number of the stages that a pipeline can be divided into varies in different 
microprocessor designs. A pipeline generally includes five stages: instruction fetch, 
decode, execution, memory access, and write-back. 

2) Branch type instruction; generally referring to all the instructions that can change 
the value of a program counter. It falls into four categories: conditional transfer 
instruction, direct unconditional transfer instruction, indirect unconditional transfer 
instruction, and procedure return instruction such as a return statement. 

3) Pipeline related: referring to pipeline pause caused by the interdependency between 
instructions. 

4) Control related: referring lo the pipeline related that is caused by a branch type 
instruction. 

5) Control related delay: referring to number of the pipeline pause clock cycle that 
results from control related. 

6) Instruction prefetch: referring to the operation that an instruction is fetched from a 
memory in advance prior to being executed. 

7) Prefetch instruction: referring to an instruction for performing prefetch. 

8) Basic block: a basic unit of a program with only one entry, i.e., the first statement 
of the basic block, and only one exit, i.e., the last statement of the basic block. A 
program can always be divided into a number of basic blocks, each of which, in turn, 
includes at least two instructions with the one at the exit being branch type. 

The execution procedure is as follows: 

1 . A compiler in the compiling module determines all possible transferring target 
addresses for a branch type instruction and inserts a prefetch instruction. 

2. When the program begins running, one of the two instruction fetch components 
reads in the first basic block while the other stays idle. 

For each basic block in the program, 
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,n each n t TT^ ** fa basic bIock sequentially reads 
n each mstrnction ,„ the basic block and determines whether it is a prefetch 
instruct™ fhe fetch component sends it to the other instruction fetch 
component for execution if the presently read-in instruction is a prefect 

eeo^oc " * f6tCh COniP ° nCnt SCnds *» instruction' to Se 

M 1ft 1 c ° m P°^nt m the instruction decoding and execution module, 

l££L * InStmCt i 0n ° f ^ b3SiC b, ° Ck ' U - thc branch W instruction 
Sect a has 7' I T u ^ ^ inStrUCti ° n d6C0di " g and ~* -dul 

* 1 , ? mStn,Cti0n fetch COm P° nent as * s ^ S e qU ent basic 

block based on the decoding results; 

(i) Assuming that the two instruction fetch components are IF 0 and IF, 
respectively, and the instructions in IF 0 are currently being executed' 
then IF i executes the prefetch instruction and prefetches the 
subsequent target instruction for the present basic block. When the 
instructions are executed sequentially, that is, the instruction that ends 
decoding is not a branch type instruction or is a branch type 
instruction whose transferring condition is "False", the instructions in 
IF 0 are selected for decoding; when the instruction that ends decoding 
is a branch type instruction whose transferring condition is "True" 
the instructions in IF, are selected for decoding, 
(ii) If at present the pipeline is executing the instructions in IF,, then the 
selection policy is just the opposite, that is, if the instructions are 
executed sequentially according to thc program, i.e., the instruction 
that ends decoding is not a branch type instruction or is a branch type 
instruction whose transferring condition is "False", the instructions in 
IF, are selected for decoding; when the instruction that ends decoding 
is a branch type instruction whose transferring condition is "True" 
the instructions in IF 0 are selected for decoding. 

Accordingly, the two instruction fetch components work in parallelism with one 
prov.ding instructions for the instruction execution component and the other 
performmg instruction prefetch, so that all the possible branch target instructions have 
already been stored in the corresponding instruction fetch components before the 
decoding of the branch type instruction ends. 

Compared with conventional compilers, the compiler of the invention has two 
additional special functions, namely, determining all the possible transferring 
addresses for the branch type instruction and inserting a prefetch instruction into the 
basic blocks according to the various categories of the branch instructions. The flow 
for the compiler to insert the prefetch instruction runs as follows. The compiling 
rout.ne reads m turn each instruction from the program code and, when a branch type 
instruction appears, which indicates that the end of the present basic block is reached 
inserts a corresponding prefetch instruction behind the first instruction of the basic 
block to which the branch type instruction belongs according to the category of the 
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branch type instruction. 

The invention designs three prefetch instructions, i.e., fetch addrl & addr2, fetch addr, 
and fetch stack, corresponding to the various branch type instructions that mainly fall 
into four categories. 

1) Conditional branch instruction. For this instruction, the transferring may succeed 
or fail and there are two subsequent bisic blocks, both of which should be 
prefetched for the present basic block. There are two possible transferring 
addresses, one stored in the instruction and the other being the address of the 
following instruction. The first prefetch instruction fetch addrl &l addr2 is now 
inserted behind the first instruction of the basic block to which the branch 
instruction belongs, to prefetch the two basic blocks that begin at the addresses 
addrl and addr2, addrl is obtained by decoding the present branch type 
instruction and add2 is the address of the following instruction with a value of the 
sum of die branch type address and instruction length (in terms of bytes). 
Direct unconditional branch instruction, where the transferring is always 
successful and there is only one subsequent basic block, so the transferring target 
address can be obtained in compiling, and only the target basic block is to be 
prefetched. There is only one possib 

instruction. The first prefetch instruction fetch addr is now inserted behind the 
first instruction of the basic block to which the branch instruction belongs, to 
prefetch the basic block that begins at the ^ddress addr, which is obtained from the 
branch type instruction decoding. 
Indirect unconditional branch instructidt 

successful, but the transferring target address cannot be obtained in compiling as it 
is stored in a register. This category of branch type instruction is not processed. 
Procedure return statement, where the transferring is always successful and there 
is only one subsequent basic block, 
procedure calling returns. The present 
return address (i.e., the prefetch address) 



2) 



3) 



4) 



These statements often occur at the 
invention stores the procedure calling 
using a stack as nesting may occur in 



procedure calling. For each procedure cs lling, the return address is stored at the 
stack top location, from where the prefetch address is obtained during prefetching. 
The prefetch instruction fetch stack is now inserted behind the first instruction of 
the basic block to which the procedure re' urn instruction belongs, to prefetch the 
basic block that begins at the stack top location address. 



Different from other instructions, the prefetch 
fetch component. The prefetch instructioik 
various RISC (Reduced Instruction Set Computers) 
the scope of the present invention as long as i 
as the present invention. The prefetch is dorle 
instructions should be executed except some 
prefetched when the conditional branch instruction 



n, where the transferring is always 



instruction is executed by an instruction 
coding may vary corresponding to 
instruction sets, but all fall into 
is utilized to realize the same function 
in basic blocks and all the prefetched 
of the subsequent instructions that are 
transfer fails. 
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ns^onSLod H 'nation* have been read fa before the branch type 

SSI z 7 v ■ rr* subsel,uen, insm,,:,i< " ,s b ° «»«& 

Wht Z oLe ^ T ^° " 1Sm " :ti ' ,n de °° din8 &r dM ° di »6 and execution 
Zl. t , , " Kali2ed an Pasa>LX s™*«o r (standard CPU 

stmulator, and tested by SPBCinSS benchmark sel provide d by the System 
Perfomtancc Evaluation Cooperative Consortium, the probability that the bStch 
fS« -~ have already been read in »hen the branch £ ttrSn 

which hotdf ,S ^ '° "- 3% "* tadireCt «—*■•"- Wh to^e fan 
M% Otne^se ^ Pereen ' aee ** P ' OCedU, °- ,Ske " im ° ■ - » 

The present invention has the following advantages- 

0 ^° mp i exity j and P° wcr consumption of hardware realization. Compared 

wuh braneh prediction technology, the present invention utilizes simple logic 
control components, omitting complex branch prediction hardware and L 

7T*r I 7 f ° r PrediCti °" err0rS ' and thcrefore mfl y neatly reduce the 
difficulty and complexity of hardware realization 

2) . el ; minati °;/ ate ° f the contro1 r ^ated delay. The present invention selects 
branch target addresses directly according to the decoding results of the branch 
type instruction, so the instruction prefetch may ensure that most of the 
mstructions have already been read in before the branch instruction decoding ends, 
and therefore, most majority of the control related delay will be eliminated. 

3) The prefetch is performed in basic blocks and the subsequent basic blocks are 
simultaneously read in when the present basic block is performed, therefore, the 
prefetch instruction is issued at an early time, which ensures adequate time for 
instruction prefetch. In the meanwhile, all the prefetched instructions, except 
those subsequent instructions that are prefetched when the conditional branch 
mstruction transfer fails, are to be executed, which effectively reduces the amount 
ot useless instruction prefetch- 

The present invention meets the objects of effectively eliminating pipeline control 
related delay and improving the performance of the embedded microprocessor with 
low complexity of hardware and low power consumption. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Suction' 3 fl ° W ° hart ° f a C ° mpiler ° f PrCSent inVenti ° n inS6rtine a prefetch 
Figure 2 illustrates the general building block of logic of the present invention; 
Figure 3 is a space vs. time diagram of the non-branch type instruction of a 
conventional microprocessor in a five-stage pipeline; 

Figure 4 is a space v S . time diagram of the branch Jype instruction of a conventional 
microprocessor in a five-stage pipeline; 

Figure 5 is a space vs. time diagram of the branch type instruction in a five-stage 
pipeline after the present invention is adopted; 
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Figure 6 illustrates the testing results of the present invention with respect to SPECint 
95 benchmark; 

Figure 7 illustrates a comparison of a control related delay eliminating method with 
the present invention and other methods, 

Detailed description of the invention 

Fig 1 is a flow chart of a compiler of the present invention inserting a prefetch 
instruction. The compiling program sequentially reads each instruction in the 
program code, The appearance of a branch type instruction indicates that the end of 
the present basic block is reached and then the compiler inserts a corresponding 
prefetch instruction behind the first instruction of the basic block the branch type 
instruction belongs to according to the category of the branch type instruction. The 
compiler can always come across a branch type instruction since the program always 
ends with a procedure return statement. 

Fig 2 illustrates the general building block of logic of the invention, which comprises 
of a compiling module, instruction fetch module, and instruction decoding and 
execution module. 

The compiling module is designed mainly to determine all the possible transferring 
addresses for the branch type instruction and insert a prefetch instruction according to 
the category of the branch type instruction. The compiler sequentially reads each 
instruction in the source program. When a branch type instruction is read, the 
compiler inserts a corresponding prefetch instruction behind the first instruction of the 
basic block the branch type instruction belongs to. This is done as follows. For a 
conditional branch type instruction, since there are two subsequent basic blocks and 
hence two corresponding prefetch addresses, prefetch instruction fetch addrl & addr2 
is inserted with addrl obtained by decoding the branch type instruction and addr2 
being the address of the following instruction with a value of the address of the 
branch type instruction plus the instruction length (in terms of byte). For a direct 
unconditional branch type instruction, there is only one subsequent basic block and 
hence one corresponding prefetch address, and prefetch instruction fetch addr is 
inserted with addr obtained by decoding the instruction. For the procedure return 
statement, there is only one subsequent basic block and hence one corresponding 
prefetch address stored in the stack top location, and prefetch instruction fetch stack is 
inserted. The compiled program code is stored in a memory. 

The instruction fetch n module is designed mainly to provide instruction to the 
instruction decoding and execution module and perform instruction prefetch, and the 
module functions through two instruction fetch components. The instruction fetch 
component IF 0 reads instructions from the instruction Cache through the port 0, while 
the instruction fetch component IFf reads instructions from the instruction Cache 
through the port 1. When the program begins running, it selects the instructions 
provided by IFo for decoding and execution and 1F| stays idle. A prefetch instruction 
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is read in and then transferred to IF, Ijy IF 0 for IF, to perform prefetch. When the 
program is running, which instruction fetch component performs instruction fetch and 
which performs instruction prefetch is determined by the decoding results of the 
branch type instruction, i.e., if IF 0 is responsible for instruction fetch, IF, is 
responsible for instruction prefetch, and the instruction that ends decoding is not a 
branch type instruction or is a branch type instruction that fails to transfer, then the 
two components do not change their operation, or otherwise IF, is responsible for 
instruction fetch, IF 0 is responsible for instruction prefetch; if IF, is responsible for 
instruction fetch, IF 0 is responsible for instruction prefetch, and the instruction that 
ends decoding is not a branch type instruction or is a branch type instruction that fails 
to transfer, then the two components do not change their operation, or otherwise IF 0 is 
responsible for instruction fetch, IF, is responsible for instruction prefetch, 

The instruction decoding and execution module is mainly responsible for the 
instruction decoding and execution. The instruction that is decoded and executed is 
provided by an instruction fetch component in the instruction fetch module, wherein 
the decoding results of the branch type instruction are sent to the selector and the two 
instruction fetch components while those of the non-branch type instruction are sent 
to the instruction execution component for execution. The selector is responsible to 
select the instructions provided by one instruction fetch component according to the 
decoding results of the branch type instruction to be decoded and executed. The 
selector selects the instructions in IF 0 for decoding and execution when the program 
begins running. And in the process of program running, the selection may be done 
based on the decoding results of the branch type instruction. If the instructions in IF 0 
are being used, and the instruction that ends decoding is not a branch type instruction 
or is a branch type instruction that fails to transfer, ihen continue using the 
instructions in IF 0 , or otherwise, use the instructions in IFu if the instructions in IF, 
are being used, and the instruction that ends decoding is not a branch type instruction 
or is a branch type instruction that fails to transfer, then continue using the 
instructions in IF,, or otherwise, use the instructions in IF 0 . During instruction 
execution, if procedure call occurs, then the procedure return address is stored in the 
stack top location so that it may be prefetched. 

Now refer to Fig 3, Consider that the pipeline is divided into five stages: instruction 
fetch (IF), instruction decoding (ID), execution (EX), memory access (MEM) and 
write back (WB), wherein the instruction p is the first instruction that is executed after 
the instruction i is executed, the instruction p+1 is the second instruction that is 
executed after the instruction i is executed, and the rest may be deduced by analogy. 
When the instruction i is decoded, which is not branch type, the instruction fetch 
component will read instruction p at the same time, and therefore, the ID stage of the 
instruction i coincides with the IF stage of the instruction p, and the EX stage of the 
instruction i coincides with the ID stage of the instruction p, causing no control 
related delay. 
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Now refer to Fig 4. The instruction t is branch -type. In the same pipeline as in Fig 3, 
the address of the instruction p can not be determined until the decoding of the 
instruction i is finished. As the EX stage of the instruction i coincides with the IF 
stage of the instruction p, there is a control related delay of a clock cycle. 

Referring to Fig 5, consider the instruction I is branch type, the instruction i| is the 
instruction executed when the branch transfer fails, the instruction i 2 is the instruction 
executed when the branch transfer succeeds, the instruction q is the second instruction 
that is executed after the branch type instruction, the instruction q+1 is the third 
instruction that is executed after the branch type instruction, and the rest may be 
deduced by analogy. After the present invention is adopted, the two instruction fetch 
components will simultaneously read the instruction i| and i 2 , respectively, when the 
instruction i is being decoded, and selects the instruction i| or \z for decoding after the 
decoding of the instruction i is finished. The EX stage of the instruction i coincides 
with the ID stage of the instruction i| or i 2 , thereby eliminating the existent control 
related delay. 

The present invention has already successfully realized in the pipeline of the IP core 
of Yinhc TS-1 embedded microprocessor and can effectively eliminate the pipeline 
control related delay. Fig 6 illustrates the testing results with SPECint95 benchmark 
set. The longitudinal axis is shown to indicate each benchmark in the SPECint95 
benchmark set and the horizontal axis is shown to indicate the elimination rate of the 
pipeline control related delay, with 88.80% for gec, 83,32% for ijpeg, 88,55% for 
compress, 88.69% for perl, 92.99% for m88ksim, 85.09% for li, 87.47% for vertex, 
and 86.16% for go. The average elimination rate of the pipeline control related 
delay is up to 87.8%. 

Fig 7 illustrates the performance that are obtained with a different method for 
eliminating the control related delay, wherein CPI (cycle per instruction) indicates the 
average number of clock cycles needed in executing an instruction, and the 
conditional branch delay, unconditional branch delay and average branch delay are all 
counted in clock cycles, and wherein pipeline pause indicates no branch delay 
eliminating technique is applied, and double fetch components indicate that the 
present invention is adopted. After adopting the present invention, the conditional 
branch delay is reduced to 0.05, the unconditional branch delay 0.09, the average 
branch delay 0.06, and the effective CPI value 1.01, all of which are much lower than 
obtained with other methods. 
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plaims 

What is claimed is: 

1. A pipeline control related delay eliminating method, wherein the general logic 
structure includes a compiling module, a instruction fetch module, and an 
instruction decoding and execution module, wherein a compiler in the compiling 
module is responsible to determine all the possible transferring addresses for the 
branch type instruction and insert a prefetch instruction according to the category 
of the branch type instruction; the instruction fetch module, responsible to, in 
addition to providing instructions for the instruction decoding and execution 
module, prefetch instruction, functions through two instruction fetch components 
IF 0 and IFj; a selector in the instruction decoding and execution module 
responsible to select the instructions provided by one of the two instruction fetch 
components according to the decoding results of the present branch type 
instruction and deliver the selected instructions to a decoding component for 
decoding and an instruction executing component for execution; the whole 
executive process being: 

1) a compiler in the compiling module determines all the possible transferring 
addresses for the branch type instruction and inserts a prefetch instruction; 

2) when the program begins to run, one instruction fetch component in the 
instruction fetch module reads in the first basic block, while the other 
instruction fetch component stays idle; for each basic block in the program: 

a) the instruction fetch component that is responsible to read in this basic 
block sequentially reads in each instruction in the basic block and 
determines whether it is a prefetch instruction; if the instruction that is 
being currently read in is a prefetch instruction, then the prefetch 
instruction is sent to the other instruction fetch component for it to 
execute the prefetch instruction, or otherwise, the instruction is sent to a 
decoding component for decoding; 

b) after the last instruction in the basic block, i,e., the branch type 
instruction, finishes decoding, a selector in the instruction decoding and 
execution module selects a basic block in the instruction component as 
the subsequent basic block of the present basic block according to the 
decoding results of the branch type instruction; 

i. if the instructions in TF 0 are being executed, then IF| executes the 
prefetch instruction to prefetch the subsequent target instruction 
for the present basic block; when instructions are sequentially 
executed, that is, the instruction that ends decoding is not branch 
type or is a branch type instruction whose transferring condition 
is False, the instructions in IFo are selected for decoding; when a 
branch type instruction is executed, that is, the instruction that 
ends decoding is a branch type instruction whose transferring 
condition is True, the instructions in IFj are selected for 
decoding; 

ii. if the instructions in IFo are being executed, then the selection 
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policy is just the opposite, that is, if instructions are sequentially 
executed, that is, the instruction that ends decoding is not branch 
type or is a branch type instruction whose transferring condition 
is False, the instructions in IF| are selected for decoding; when a 
branch type instruction is executed, that is, the instruction lhat 
ends decoding is a branch type instruction whose transferring 
condition is True, the instructions in IFo are selected for 
decoding; 

2. A pipeline control related delay eliminating method according to Claim 1 , wherein 
the flow of the compiler inserting a prefetch instruction is: 

the compiling program sequentially reads each instruction in the program 
code; if a branch type instruction is read, which indicates that the end of the 
present basic block is reached, the compiling program inserts a corresponding 
prefetch instruction behind the first instruction of the basic block to which the 
branch type instruction belongs according to the category of the branch type 
instruction. 

3. A pipeline control related delay eliminating method according to Claim 1, wherein 
the prefetch instruction is designed corresponding to various branch type 
instructions, including fetch addrl and addr2, fetch addr, and fetch stack, and 
different prefetch instruction is applied to different branch type instruction: 

1) conditional branch instruction: the transfer may succeed or fail and there are 
two subsequent basic blocks, and therefore, the two subsequent basic blocks 
of the present basic block should be prefetched simultaneously; there are Iwo 
possible transferring addresses, with one address stored in the instruction and 
the other being the address of the following instruction; now the prefetch 
instruction fetch addrl and addr2 is inserted behind the first instruction of the 
basic block to which the branch type instruction belongs, to prefetch the two 
basic blocks that begin at the addresses addrl and addr2, respectively; addrl 
is obtained by decoding the branch type instruction, and addr2 is the address 
of the following instruction with a value of the sum of the address of the 
branch type instruction and its length; 

2) direct unconditional branch type instruction: the transfer is always successful 
and there is only one subsequent basic block, and therefore the transfer target 
address can be obtained and il is only required to prefetch the target basic 
block; there is only one possible transfer address stored in the instruction; 
now the prefetch instruction fetch addr is inserted behind the first instruction 
of the basic block to which the branch type instruction belongs, to prefetch 
the basic block that begins from the addresses addr; addr is obtained by 
decoding the branch type instruction; 

3) indirect unconditional branch type instruction: the transfer is always 
successful, but the transfer target address is usually unobtainable for 
compiling as it is stored in a register. Therefore, this instruction is not 
processed. 

4) Procedure return statement: the transfer is always successful and there is only 
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one subsequent basic block. Jhis statement often appears at the procedure 
call return. A stack is adopted to store the return address, i.e., the prefetch 
address, for the procedure call as nesting may occur. At each procedure call 
the return address is stored at the slack top location, from where it is obtained 
in prefetching. At this point, the prefetch instruction fetch stack is inserted 
behind the first instruction of the basic block to which the procedure return 
instruction belongs to prefetch the basic block that begins at the stack top 
location. 

4. A pipeline control related delay eliminating method according to Claim 1, wherein 
the instruction prefetch is performed in basic blocks and all prefetched 
instructions will be executed except some of the subsequent instructions that are 
prefetched when the conditional branch instruction transfer fails. 
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